Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pass hoisting RT.await_future out of scf.forall loops #748

Merged
merged 17 commits into from
Apr 9, 2024

Conversation

andidr
Copy link
Contributor

@andidr andidr commented Mar 15, 2024

The new pass hoists RT.await_future operations whose results are
yielded by scf.forall operations out of the loops in order to avoid
over-synchronization of data-flow tasks.

E.g., the following IR:

scf.forall (%arg) in (16)
  shared_outs(%o1 = %sometensor, %o2 = %someothertensor)
  -> (tensor<...>, tensor<...>)
{
  ...
  %rph = "RT.build_return_ptr_placeholder"() :
    () -> !RT.rtptr<!RT.future<tensor<...>>>
  "RT.create_async_task"(..., %rph, ...) { ... } : ...
  %future = "RT.deref_return_ptr_placeholder"(%rph) :
    (!RT.rtptr<!RT.future<...>>) -> !RT.future<tensor<...>>
  %res = "RT.await_future"(%future) : (!RT.future<tensor<...>>) -> tensor<...>
  ...
  scf.forall.in_parallel {
    ...
    tensor.parallel_insert_slice %res into %o1[..., %arg2, ...] [...] [...] :
      tensor<...> into tensor<...>
    ...
  }
}

is transformed into:

%tensoroffutures = tensor.empty() : tensor<16x!RT.future<tensor<...>>>

scf.forall (%arg) in (16)
  shared_outs(%otfut = %tensoroffutures, %o2 = %someothertensor)
  -> (tensor<...>, tensor<...>)
{
  ...
  %rph = "RT.build_return_ptr_placeholder"() :
    () -> !RT.rtptr<!RT.future<tensor<...>>>
  "RT.create_async_task"(..., %rph, ...) { ... } : ...
  %future = "RT.deref_return_ptr_placeholder"(%rph) :
    (!RT.rtptr<!RT.future<...>>) -> !RT.future<tensor<...>>
  %wrappedfuture = tensor.from_elements %future :
    tensor<1x!RT.future<tensor<...>>>
  ...
  scf.forall.in_parallel {
    ...
    tensor.parallel_insert_slice %wrappedfuture into %otfut[%arg] [1] [1] :
      tensor<1xRT.future<tensor<...>>> into tensor<16x!RT.future<tensor<...>>>
    ...
  }
}

scf.forall (%arg) in (16) shared_outs(%o = %sometensor) -> (tensor<...>) {
  %future = tensor.extract %tensoroffutures[%arg] :
    tensor<4x!RT.future<tensor<...>>>
  %res = "RT.await_future"(%future) : (!RT.future<tensor<...>>) -> tensor<...>
  scf.forall.in_parallel {
    tensor.parallel_insert_slice %res into %o[..., %arg, ...] [...] [...] :
      tensor<...> into tensor<...>
  }
}

@andidr andidr force-pushed the andi/tiling-optimizations branch 3 times, most recently from 7b07f30 to edc871d Compare April 4, 2024 09:55
@BourgerieQuentin
Copy link
Member

Could be good also to have check-tests for the hoisting pass

andidr added 11 commits April 8, 2024 12:02
This adds a new option `dump-fhe-df-parallelized` to
`concretecompiler` that dumps the IR after the generation of data-flow
tasks.
…ecific code

This introduces a new function `normalizeInductionVar()` to the static
loop utility code in `concretelang/Analysis/StaticLoops.h` with code
extracted for IV normalization from the batching code and changes the
batching code to make use of the factored function.
…ersion patterns

Some of the TFHE to Concrete conversion patterns implicitly assume
that operands are ciphertexts and thus that the converted types have a
higher number of dimensions than the original types. However, for
non-ciphertext types, the number of dimensions before and after the
conversion must be the same.

This commit adds a check to the respective conversion patterns
triggering a simple type conversion that preserves the number of
dimensions for non-ciphertext types.
@andidr andidr force-pushed the andi/tiling-optimizations branch from edc871d to c09e11c Compare April 8, 2024 13:43
@andidr
Copy link
Contributor Author

andidr commented Apr 8, 2024

Could be good also to have check-tests for the hoisting pass

Done.

andidr added 3 commits April 8, 2024 15:50
…th nested blocks

The current scheme used by reinstantiating conversion patterns in
`lib/Conversion/Utils/Dialects` for operations with blocks is to
create a new operation with empty blocks, to move the operations from
the old blocks and then to replace any references to block
arguments. However, such in-place updates of the types of block
arguments leave conversion patterns for operations nested in the
blocks without the ability to determine the original types of values
from before the update.

This change uses proper signature conversion for block arguments, such
that the original types of block arguments with converted types is
preserved, while the new types are made available through the dialect
conversion infrastructure via the respective adaptors.
… bufferization

This adds support for `memref.alloc`, `memref.load`, `memref.store`,
`memref.copy` and `memref.subview` to the RT task bufferization pass.
@andidr andidr force-pushed the andi/tiling-optimizations branch from c09e11c to 7430587 Compare April 8, 2024 13:54
andidr added 3 commits April 8, 2024 16:16
…oops

The new pass hoists `RT.await_future` operations whose results are
yielded by scf.forall operations out of the loops in order to avoid
over-synchronization of data-flow tasks.

E.g., the following IR:

```
scf.forall (%arg) in (16)
  shared_outs(%o1 = %sometensor, %o2 = %someothertensor)
  -> (tensor<...>, tensor<...>)
{
  ...
  %rph = "RT.build_return_ptr_placeholder"() :
    () -> !RT.rtptr<!RT.future<tensor<...>>>
  "RT.create_async_task"(..., %rph, ...) { ... } : ...
  %future = "RT.deref_return_ptr_placeholder"(%rph) :
    (!RT.rtptr<!RT.future<...>>) -> !RT.future<tensor<...>>
  %res = "RT.await_future"(%future) : (!RT.future<tensor<...>>) -> tensor<...>
  ...
  scf.forall.in_parallel {
    ...
    tensor.parallel_insert_slice %res into %o1[..., %arg2, ...] [...] [...] :
      tensor<...> into tensor<...>
    ...
  }
}
```

is transformed into:

```
%tensoroffutures = tensor.empty() : tensor<16x!RT.future<tensor<...>>>

scf.forall (%arg) in (16)
  shared_outs(%otfut = %tensoroffutures, %o2 = %someothertensor)
  -> (tensor<...>, tensor<...>)
{
  ...
  %rph = "RT.build_return_ptr_placeholder"() :
    () -> !RT.rtptr<!RT.future<tensor<...>>>
  "RT.create_async_task"(..., %rph, ...) { ... } : ...
  %future = "RT.deref_return_ptr_placeholder"(%rph) :
    (!RT.rtptr<!RT.future<...>>) -> !RT.future<tensor<...>>
  %wrappedfuture = tensor.from_elements %future :
    tensor<1x!RT.future<tensor<...>>>
  ...
  scf.forall.in_parallel {
    ...
    tensor.parallel_insert_slice %wrappedfuture into %otfut[%arg] [1] [1] :
      tensor<1xRT.future<tensor<...>>> into tensor<16x!RT.future<tensor<...>>>
    ...
  }
}

scf.forall (%arg) in (16) shared_outs(%o = %sometensor) -> (tensor<...>) {
  %future = tensor.extract %tensoroffutures[%arg] :
    tensor<4x!RT.future<tensor<...>>>
  %res = "RT.await_future"(%future) : (!RT.future<tensor<...>>) -> tensor<...>
  scf.forall.in_parallel {
    tensor.parallel_insert_slice %res into %o[..., %arg, ...] [...] [...] :
      tensor<...> into tensor<...>
  }
}
```
@andidr andidr force-pushed the andi/tiling-optimizations branch from 7430587 to f506f5f Compare April 8, 2024 14:16
@andidr andidr force-pushed the andi/tiling-optimizations branch from 278c9dc to f506f5f Compare April 9, 2024 12:54
@andidr andidr merged commit f506f5f into main Apr 9, 2024
52 of 56 checks passed
@andidr andidr deleted the andi/tiling-optimizations branch April 9, 2024 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants